An Introduction to Python Dashboards

Marc Dotson

2025-02-28

  • Provide some guidance for the datathon
  • Focus on Python beginners
  • Assume you’ve installed Python (and an IDE)
  • Focus is on three specific data tools
  • Special focus on human-readable code
  • Need help? Raise your hand or talk after

Polars

  • New data wrangling library
  • Alternative to Pandas for using DataFrames
  • Fast – in Rust, uses Apache Arrow, built to parallelize and use GPUs, allows for lazy evaluation
  • More consistent syntax than Pandas
  • Anagram of its query engine (OLAP) and Rust (rs)

Filter, slice, and sort observations

Polars syntax follows a human-readable, SQL-like grammar

  • Install with pip install polars
  • Parameters for .slice() are the start index and length
  • We use pl.col() to reference variables in our data
import polars as pl
import os

customer_data = pl.read_csv(os.path.join('data', 'customer_data.csv'))
customer_data.shape
customer_data.columns

customer_data.filter(pl.col('college_degree') == 'Yes')
customer_data.filter(pl.col('region') != 'West')
customer_data.filter(pl.col('gender') != 'Female', pl.col('income') > 70000)

customer_data.slice(0, 5)

customer_data.sort(pl.col('birth_year'))
customer_data.sort(pl.col('birth_year'), descending = True)

Select and recode/create variables, join data frames

Polars .filter() and .select() are separate methods

customer_data.select(pl.col('region'), pl.col('review_text'))
customer_data.select(pl.col(['region', 'review_text']))

customer_data.with_columns(income_new = pl.col('income') / 1000)

store_transactions = pl.read_csv(os.path.join('data', 'store_transactions.csv'))

customer_data.join(store_transactions, on = 'customer_id', how = 'left')
customer_data.join(store_transactions, on = 'customer_id', how = 'inner')

Chain methods to supercharge human-readability

Polars embraces method chaining to improve efficiency

  • The entire chain needs to be surrounded with ( )
  • Each line starts with .
  • You run the whole block of code at once
  • The consistent syntax can be read like a sentence
(customer_data
  .join(store_transactions, on = 'customer_id', how = 'left')
  .filter(pl.col('region') == 'West', pl.col('feb_2005') == pl.col('feb_2005').max())
  .with_columns(age = 2025 - pl.col('birth_year'))
  .select(pl.col(['age', 'feb_2005']))
  .sort(pl.col('age'), descending = True)
  .slice(0, 1)
)

Summarize variables, including grouped summaries

Human-readable code is designed to be consistent at the cost of being verbose

(customer_data
  .select(pl.col('income'))
  .mean()
)

(customer_data
  .group_by(pl.col(['region', 'college_degree']))
  .agg(n = pl.len())
)

(customer_data
  .group_by(pl.col(['gender', 'region']))
  .agg(
    avg_income = pl.col('income').mean(), 
    avg_credit = pl.col('credit').mean()
  )
  .sort(pl.col('avg_income'), descending = True)
)